Fast read operation utilizing reduced storage of metadata in a distributed encoded storage system

ABSTRACT

A data object can be encoded into multiple encoded data fragments and stored on the backend of a distributed encoded storage system. The identifiers and metadata corresponding to each fragment of the data object can be stored in a first section of a metadata unit, and the initial part of the data object in a second section. The metadata unit is encoded into multiple metadata fragments, which are stored on the backend. The identifiers of the metadata fragments can be associated with the data object and stored on a fast frontend storage device. In response to a request to access the data object, the identifiers are used to retrieve the metadata fragments from the backend, and decode the metadata unit. The initial part of the data object is retrieved from the metadata unit and transmitted to the requesting client application to begin processing the data object.

CROSS-REFERENCES TO RELATED APPLICATION

This application is related to the U.S. patent application Ser. No. 15/961,871 filed with the U.S. Patent and Trademark Office having the same assignee and entitled Reduced Storage of Metadata in a Distributed Encoded Storage System, the entire contents of which are hereby incorporated herein by reference.

TECHNICAL FIELD

The present disclosure pertains generally to storage systems, and more specifically to fast read operations utilizing reduced storage of metadata in a distributed encoded storage system.

BACKGROUND

The rise in electronic and digital device technology has rapidly changed the way society communicates, interacts, and consumes goods and services. Modern computing devices allow organizations and users to have access to a variety of useful applications in many locations. Using such applications results in the generation of a large amount of data. Storing and retrieving the produced data is a significant challenge associated with providing useful applications and devices.

The data generated by online services and other applications can be stored at data storage facilities. As the amount of data grows, having a plurality of users sending and requesting data can result in complications that reduce efficiency and speed. Quick and reliable access in storage systems is important for good performance.

Distributed encoded storage systems typically divide each data object to be stored into a plurality of data pieces, each of which is encoded into a plurality of encoded data fragments. The encoded data fragments are spread across multiple backend storage elements, thereby providing a given level of redundancy. A distributed encoded storage system maintains metadata which identifies each stored data object, specifies where in the system each data object is stored, including where the encoded data fragments have been distributed and hence from where they can subsequently be retrieved, what type of encoding has been used, etc. For each encoded fragment of the data object, an identifier, location information and encoding information are maintained. Thus, storage of a single data object generates a large amount of associated metadata.

As noted above, a distributed encoded storage system stores the encoded data fragments on storage elements in the backend. However, because the corresponding metadata is accessed frequently and needs to be provided with a high level of responsiveness, it is typically stored on storage elements other than those of the backend, as this would lead to unacceptable delays. Typically the backend storage elements on which data objects are stored are in the form of hard disks, while the metadata is stored on expensive, fast, low latency storage elements, such as solid state disks (“SSDs”). The separate storage of metadata on SSDs leads to the problem of a higher cost and typically a reduced level of durability.

In addition, although conventional systems make use of expensive, fast, low latency storage elements, such as SSDs, for storage of the metadata, the overall responsiveness of read operations is still negatively impacted by the need to retrieve a sufficient number of encoded data fragments from the slower backend storage elements after retrieval of the metadata, before the client application that made the read request can access the data object. In such conventional systems, a large number of encoded fragments must be downloaded from the backend in order to decode the data object, before the client application can begin processing the data object. This adds significant latency to read operations.

It would be desirable to address at least these issues.

SUMMARY

A data object can be encoded into a plurality of encoded data fragments and stored on backend storage elements in a distributed encoded storage system. The identifiers and metadata corresponding to each encoded fragment of the data object can be stored in a single metadata unit, which can be encoded into a plurality of metadata fragments. The single metadata unit can contain the identifiers and associated metadata for all of the encoded data fragments of the data object. The metadata unit can also contain an initial part of the data object itself. Each one of the plurality of encoded metadata fragments can be transmitted to the backend of the distributed encoded data storage system for distribution across the plurality of backend storage elements, wherein the distribution of the plurality of encoded metadata fragments across the plurality of backend storage elements provides the same specific level of redundancy as the encoded data fragments, and hence the underlying data object. The identifiers of the encoded metadata fragments can be associated with the data object and stored on a low latency frontend storage device. These fragment identifiers can be quickly retrieved, and used to retrieve the encoded metadata fragments from the backend. The retrieved encoded metadata fragments can be used to decode the single metadata unit, and the initial part of the data object in the metadata unit can be used to begin the fulfilment of a read request targeting the data object, while the identifiers and metadata in the metadata unit are being used to retrieve the encoded data fragments from the backend and decode the data object. This enables the requesting client application to begin processing the contents of the data object, without the delays associated with downloading sufficient encoded data fragments to allow for the decoding operation. Therefore, the end-user perceived latency of read operations is reduced considerably, as well as the end-user perceived responsiveness of the distributed storage system generally.

A system of one or more computers can be configured to perform particular operations or actions by virtue of having software, firmware, hardware, or a combination of these installed on the system, where the software, firmware and/or hardware cause(s) the system to perform the actions. One or more computer programs can be configured to perform particular operations or actions by virtue of including instructions that, when executed by data processing apparatus, cause the apparatus to perform the actions.

One general aspect includes a computer-implemented method comprising: encoding a data object into a plurality of encoded data fragments; transmitting each one of the plurality of encoded data fragments to a backend of a distributed encoded data storage system, for distribution across a plurality of backend storage elements, wherein the distribution of the plurality of encoded data fragments across the plurality of backend storage elements provides a specific level of redundancy; in response to transmitting each one of the plurality of encoded data fragments, receiving, for each one of the plurality of encoded data fragments, a corresponding fragment identifier from the backend of the distributed encoded data storage system (in another embodiment, the fragment identifiers can be generated or selected on the front end, and, e.g., provided to the backend for use); for each one of the plurality of encoded data fragments, storing the corresponding fragment identifier and associated metadata in a first section of a single metadata unit; storing an initial part of the data object in a second section of the single metadata unit; encoding the single metadata unit into a plurality of encoded metadata fragments; transmitting each one of the plurality of encoded metadata fragments to the backend of the distributed encoded data storage system for storage, wherein the distribution of the plurality of encoded metadata fragments across the plurality of backend storage elements provides the specific level of redundancy; in response to transmitting each one of the plurality of encoded metadata fragments, receiving, for each one of the plurality of encoded metadata fragments a corresponding fragment identifier from the backend of the distributed encoded data storage system (as noted above, in another embodiment, the fragment identifiers can be generated or selected on the front end); associating the fragment identifiers corresponding to encoded metadata fragments with the data object; storing the fragment identifiers corresponding to the encoded metadata fragments in a frontend storage element, wherein the frontend storage element provides faster access to stored content than backend storage elements; receiving a request from a client application to access the data object; retrieving the fragment identifiers corresponding to the encoded metadata fragments from the frontend storage element, responsive to receiving the request to access the data object from the client application; retrieving the encoded metadata fragments from the backend of the distributed encoded data storage system; decoding the single metadata unit using the retrieved encoded metadata fragments; retrieving the initial part of the data object from the second section of the single metadata unit; and transmitting the initial part of the data object to the client application to begin processing the data object.

Other embodiments of this aspect include corresponding computer systems, system means, apparatus, and computer programs recorded on one or more computer storage devices, each configured to perform the actions of the methods.

Some implementations may optionally include one or more of the following features: that transmitting the initial part of the data object to the client application to begin processing the data object further comprises transmitting the initial part of the data object to the client application to begin processing the data object, prior to retrieving the encoded data fragments from the backend of the distributed encoded data storage system. That transmitting the initial part of the data object to the client application to begin processing the data object further comprises transmitting the initial part of the data object to the client application to begin processing the data object, concurrently with retrieving the encoded data fragments from the backend of the distributed encoded data storage system. That to begin processing the data object further comprises displaying a preview of the data object on a screen of a computing device and/or launching an end-user application for opening the data object. That the initial part of the data object further comprises a header and a subsection of content of the data object, and/or information concerning a format of the data object and a subsection of contents of the data object. That receiving a request from a client application to access the data object further comprises receiving a read request targeting the data object from the client application. The features retrieving, for each one of the plurality of encoded data fragments, the corresponding fragment identifier and associated metadata from the first section of the single metadata unit; retrieving encoded data fragments from the backend of the distributed encoded data storage system, utilizing corresponding fragment identifiers and associated metadata; and transmitting encoded data fragments to the client application to decode the data object, subsequently to transmitting the initial part of the data object to the client application to begin processing the data object. The features retrieving, for each one of the plurality of encoded data fragments, the corresponding fragment identifier and associated metadata from the first section of the single metadata unit; retrieving encoded data fragments from the backend of the distributed encoded data storage system, utilizing corresponding fragment identifiers and associated metadata; decoding the data object using the retrieved encoded data fragments; and transmitting the data object to the client application, subsequently to transmitting the initial part of the data object to the client application to begin processing the data object. That the backend storage elements further comprise electromechanical storage devices. That the frontend storage element further comprises a solid state storage device, such as flash memory and/or dynamic random access memory. That providing a specific level of redundancy further comprises: providing a predetermined level of storage redundancy associated with the distributed encoded data storage system. That metadata associated with each specific one of the plurality of encoded data fragments further comprises, a storage location of the specific encoded data fragment and encoding information concerning the specific encoded data fragment. The features transmitting a copy of content of the frontend storage element to the backend of the distributed encoded data storage system for storage; retrieving the stored copy from the backend, responsive to a loss of content of the frontend storage element; and storing the retrieved copy on the frontend storage element. The features caching content of at least one electromechanical backend storage element on the solid state frontend storage element. That in response to transmitting each one of the plurality of encoded data fragments, receiving, for each one of the plurality of encoded data fragments, a corresponding fragment identifier from the backend of the distributed encoded data storage system. That in response to transmitting each one of the plurality of encoded metadata fragments, receiving, for each one of the plurality of encoded metadata fragments, receiving a corresponding fragment identifier from the backend of the distributed encoded data storage system. The feature that each encoded metadata fragment is of a same fragment format as the encoded data fragments.

Note that the above list of features is not all-inclusive and many additional features and advantages are contemplated and fall within the scope of the present disclosure. Moreover, the language used in the present disclosure has been principally selected for readability and instructional purposes, and not to limit the scope of the subject matter disclosed herein.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a diagram of a distributed encoded storage system in which a data object retrieval manager can operate, according to one embodiment.

FIG. 2 is a diagram illustrating the operation of a data object retrieval manager, according to one embodiment.

FIG. 3 is a flowchart illustrating the operation of a data object retrieval manager, according to one embodiment.

The Figures depict various embodiments for purposes of illustration only. One skilled in the art will readily recognize from the following discussion that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles described herein.

DETAILED DESCRIPTION

The present disclosure describes technology, which may include methods, systems, apparatuses, computer program products, and other implementations, for executing read operations of data objects in a distributed encoded storage system, in which identifiers and metadata corresponding to each distributed encoded fragment of a data object are stored in a single metadata unit, which can be encoded into a plurality of metadata fragments. The single metadata unit can contain the identifiers and associated metadata for all of the encoded data fragments of the data object. The metadata unit can also contain an initial part of the data object itself. Each one of the plurality of encoded metadata fragments can be transmitted to the backend of the distributed encoded data storage system for distribution across the plurality of backend storage elements, wherein the distribution of the plurality of encoded metadata fragments across the plurality of backend storage elements provides the same specific level of redundancy as the encoded data fragments, and hence the underlying data object. The identifiers of the encoded metadata fragments can be associated with the data object and stored on a low latency frontend storage device. These fragment identifiers can be quickly retrieved, and used to retrieve the encoded metadata fragments from the backend. The retrieved encoded metadata fragments can be used to decode the single metadata unit, and the initial part of the data object in the metadata unit can be used to begin the fulfilment of a read request targeting the data object, while the identifiers and metadata in the metadata fragment are being used to retrieve the encoded data fragments and decode the data object. This enables the requesting client application to begin processing the contents of the data object, without the delays associated with downloading sufficient encoded data fragments to allow for the decoding operation. Therefore, the end-user perceived latency of read operations is reduced considerably, as well as the end-user perceived responsiveness of the distributed storage system generally.

FIG. 1 illustrates an exemplary datacenter 109 in a distributed encoded storage system 100 in which a data object retrieval manager 101 can operate, according to one embodiment. In the illustrated distributed encoded storage system 100, datacenter 109 comprises storage servers 105A, 105B and 105N, which are communicatively coupled via a network 107. A data object retrieval manager 101 is illustrated as residing on storage server 105A. It is to be understood that the data object retrieval manager 101 can reside on more, fewer or different computing devices, and/or can be distributed between multiple computing devices, as desired. In FIG. 1, storage server 105A is further depicted as having storage devices 160A(1)-(N) attached, storage server 105B is further depicted as having storage devices 160B(1)-(N) attached, and storage server 105N is depicted with storage devices 160N(1)-(N) attached. It is to be understood that storage devices 160A(1)-(N), 160B(1)-(N) and 160N(1)-(N) can be instantiated as electromechanical storage such as hard disks, solid state storage such as flash memory, other types of storage media, and/or combinations of these.

Although three storage servers 105A-N each coupled to three devices 160(1)-(N) are illustrated for visual clarity, it is to be understood that the storage servers 105A-N can be in the form of rack mounted computing devices, and datacenters 109 can comprise many large storage racks each housing a dozen or more storage servers 105, hundreds of storage devices 160 and a fast network 107. It is further to be understood that although FIG. 1 only illustrates a single datacenter 109, a distributed encoded storage system can comprise multiple datacenters, including datacenters located in different cities, countries and/or continents.

It is to be understood that although the embodiment described in conjunction with FIG. 2-3 is directed to object storage, in other embodiments the data object retrieval manager 101 can operate in the context of other storage architectures. As an example of another possible storage architecture according to some embodiments, server 105A is depicted as also being connected to a SAN fabric 170 which supports access to storage devices 180(1)-(N). Intelligent storage array 190 is also shown as an example of a specific storage device accessible via SAN fabric 170. As noted above, SAN 170 is shown in FIG. 1 only as an example of another possible architecture to which the data object retrieval manager 101 might be applied in another embodiment. In yet other embodiments, shared storage can be implemented using FC and iSCSI (not illustrated) instead of a SAN fabric 170.

Turning to FIG. 2, in one example embodiment, the data object retrieval manager 101 executes a fast retrieval of a data object 201 stored as a plurality of encoded data fragments 203 on the backend 211 of the distributed encoded data storage system 100, redundantly distributed across multiple backend storage elements 213. As described in detail in related application WDA-3569-US, the data object retrieval manager 101 first divides a data object 201 (for example of size 256 MB, although data objects 201 can be of different sizes in different embodiments) into a plurality of data pieces 202. FIG. 2 illustrates the data object 201 being divided into four data pieces (202A, 202B, 202C and 202D) for clarity of illustration. Typically, a data object 201 would be divided into (many) more data pieces 202 in practice. For example, each data piece 202 could be of size 1 MB, although data pieces 202 can be of different sizes in different embodiments.

The data object retrieval manager 101 encodes each one of these data pieces 202 into a plurality of corresponding encoded data fragments 203, which are distributed across multiple backend storage elements 213 to provide a given level of redundancy. For example, FIG. 2 illustrates each separate data piece 202A-D being encoded into three separate encoded data fragments, e.g., 203A1, 203A2 and 203A3, etc. It is to be understood that three is just an example number of data fragments 203, and data pieces 202 can be divided into more (or fewer) data fragments 203 as desired. It is to be further understood that the specific encoding format used to encode the data pieces 202 into encoded data fragments 203 can vary between embodiments, as can the size of the encoded data fragments 203.

These encoded data fragments 203 are provided to the backend 211 of the distributed encoded data storage system 100, where they are redundantly distributed across multiple backend storage elements 213, thereby providing a specific level of redundancy (e.g., a predetermined level of storage redundancy associated with the distributed encoded data storage system 100). It is to be understood that low latency frontend storage elements 209 may be in the form of SSDs such as flash memory devices, whereas backend storage elements 213 may be in the form of hard disks or other forms of electromechanical storage devices. Because solid state storage has significantly faster access times than the electromechanical storage, frontend storage elements 209 typically provide faster access to stored content than backend storage elements 213. In addition, solid state storage is significantly more expensive per megabyte than electromechanical storage.

For each encoded data fragment 203 transmitted to the backend 211 for storage, the backend 211 returns a corresponding fragment identifier 205 (as noted above, in other embodiments fragment identifiers can be generated/selected on the frontend and provided to the backend for use). In addition, metadata 207 concerning each encoded data fragment 203 stored on the backend 211 is generated, such as the storage location of the specific encoded data fragment 203 (e.g., information enabling the subsequent retrieval of the encoded data fragment 203), encoding information concerning the specific encoded data fragment 203 (e.g., the encoding format used and any relevant parameters), relationship information between the specific encoded data fragment 203 and its corresponding data piece 202, relationship information between the specific encoded data fragment 203 and other encoded data fragments 203 and/or relationship information between the multiple data pieces 202 of the data object 201 (e.g., information enabling the reconstruction of the underlying data object 201 from the plurality of fragments 203), etc. It is to be understood that the specific format and/or content of metadata 207 concerning encoded data fragments 203 can vary between embodiments.

For each encoded data fragment 203, the data object retrieval manager 101 stores the corresponding fragment identifier 205 and associated metadata 207 in a single metadata unit 214, which is then encoded into a plurality of encoded metadata fragments, e.g., 215A, 215B and 215C. It is to be understood that the metadata unit 214 is analogous to a data piece 202, and the encoded metadata fragments 215 can be of the same fragment format as the encoded data fragments 203. In addition, the metadata unit 214 can be the same size (e.g., 1 MB) as the data pieces 202. In one embodiment, in order to construct the metadata unit 214, as fragment identifiers 205 are returned from the backend 211, the fragment identifiers and corresponding metadata 207 and stored in a temporary buffer. The contents of the temporary buffer is then copied into the single metadata unit 214.

The fragment identifiers 205 and associated metadata 207 can be stored in a first section of the metadata unit 214, and an initial part of the data object 201 can be stored in a second section of the metadata unit 214. For example, in an embodiment in which the size of a fragment is, e.g., 1 MB, a first 0.5 MB section of the metadata unit 214 could be used to store the fragment identifiers 205 and metadata 207 associated with the various encoded data fragments 203, and a second 0.5 MB section of the metadata unit 214 could be used to store the initial part of the data object 201. (One megabyte is just an example of a possible size for the metadata unit 214, and other sizes can be used in other embodiments). Continuing with the example described above, the metadata unit 214 would contain both the identifiers 205 and metadata 207 of all the encoded data fragments 203, and the initial part of the data object 201. Note that the initial part of the data object 201 may comprise a header and additional content. When a client application 217 makes a request to access the data object 201 (e.g., a read operation), the data object retrieval manager 101 can quickly provide the provide the initial part of the data object 201 (e.g., header and content) to the requesting client application 217, and the client application 217 can begin processing the initial part of the data object 201 while the various encoded data fragments 203 of the data object 201 are being retrieved from the backend 211.

More specifically, before or simultaneously with the retrieval of the data object 201 from the backend 211, the initial part of the data object 201 can be quickly retrieved from the metadata unit 214 and provided to the requesting client application 217. The client application 217 can then use the, e.g., header and initial content of the data object 201, to begin processing the data object 201 as described in more detail below, for example in order to show a suitable preview of the data object 201 on the screen of a computing device, to launch a suitable end-user application for opening the data object 201 (e.g., a word processor, spreadsheet program, graphics program, database client or other end-user application as desired), etc. It is to be understood that as used herein, the language “first section” and “second section” of the metadata unit 214 does not denote that the sections are positioned within the metadata unit 214 in a specific order, but only that these are two specific sections thereof. It is also be understood that although the above example describes the two sections of the metadata unit 214 as being of equal size, in some embodiments more or less space can be used for one section or the other as desired. In addition, the exact format and content of the initial part of the data object 201 can vary between embodiments, but may contain a header and/or other information concerning the format of the data object 201, as well as a subset of the substantive contents of the data object 201 itself.

The data object retrieval manager 101 transmits the multiple encoded metadata fragments 215 encoded from the metadata unit 214 to the backend 211 of the distributed encoded data storage system 100 for storage, as was done with the encoded data fragments 203. In response to transmitting each encoded metadata fragment 215, a corresponding fragment identifier 205 is received from the backend 211. In other words, the backend 211 returns a fragment identifier 205 for each encoded metadata fragment 215, as it did for the encoded fragments 203 (as noted above, in other embodiments fragment identifiers can be generated/selected on the frontend and provided to the backend for use). And like the encoded data fragments 203, the backend 211 redundantly distributes the encoded metadata fragments 215 across multiple backend storage elements 213, thereby providing the same level of redundancy for the encoded metadata fragments 215 as for the encoded data fragments 203 (e.g., the predetermined level of storage redundancy associated with the distributed encoded data storage system 100).

The metadata fragment identifiers 205 are associated with the data object 201, and stored on a frontend storage element 209, which as noted above provides faster access to stored content than backend storage elements 213. It is to be understood that the metadata fragment identifiers 205, which are just the fragment identifiers 205 that identify the encoded metadata fragments 215, are much smaller (for example, 16 bytes) than the identifiers 205 and associated metadata 207 (e.g., hundreds of kilobytes) corresponding to all of the encoded data fragments 203. Thus, the space consumed per data object 201 of the fast, low latency, expensive frontend storage elements 209 is reduced.

Because the fragment identifiers 205 that identify the metadata fragments 215 are stored on fast, low latency frontend storage 209, they can be retrieved quickly. Because the metadata fragment identifiers 205 are linked to the data object 201, they can be used to retrieve the data object 201 from the backend 211. More specifically, when the data object retrieval manager 101 receives a read (or other access) request targeting the data object 201 from a client application 217, the data object retrieval manager 101 can retrieve the metadata fragment identifiers 205 associated with the data object 201 from a low latency frontend storage element 209, and use the metadata fragment identifiers 205 to retrieve the corresponding metadata fragments 215 from the backend 211. The data object retrieval manager 101 can then decode the single metadata unit 214 using the retrieved encoded metadata fragments 215. As explained above, this metadata unit 214 contains the identifiers 205 and associated metadata 207 for all of the encoded data fragments 203 of the data object 201.

The data object retrieval manager 101 can use the encoded data fragment identifiers 205 and associated metadata 207 to retrieve the encoded data fragments 203 of the data object 201 from the backend 211. The metadata unit 214 also contains the initial part of the data object 201. As described above, the data object retrieval manager 101 can retrieve the initial part of the data object 201 from the metadata unit 214, and transmit the initial part of the data object 201 to the requesting client application 217 (e.g., the client application 217 that executed the read operation targeting the data object 201). In other words, the data object retrieval manager 101 can use the initial part of the data object 201 to begin the fulfilment of an access request, before or while the encoded data fragments 205 are being retrieved from the backend 211. More specifically, during a retrieval operation of the data object 201, the metadata unit 214 is retrieved, and the initial part of the data object 201, which is contained in the metadata unit 214, can be provided immediately to the requesting client application 217. In this way the client application 217 can begin processing the data object 201 with reduced latency, because the initial part of the data object 201 can be provided to the client application 217 before or concurrently with the retrieval of the encoded data fragments 203.

As explained above, this initial portion of the data object 201 may comprise a header and/or other suitable information for initiating the processing of the data object 201, as well as content that can be used for, e.g., operations such as displaying a preview or launching a suitable end-user application, etc. While or after this occurs, the data object retrieval manager 101 can retrieve the corresponding fragment identifiers 205 and associated metadata 207 for the encoded data fragments 203 from the first section of the single metadata unit 214, and use this information to retrieve the encoded data fragments 203 from the backend 211 of the distributed encoded data storage system 100. In due course, the data object retrieval manager 101 can decode the data object 201 using the encoded data fragments 203, and provide the data object 201 to the requesting client application 217. In another embodiment, the data object retrieval manager 101 transmits the encoded data fragments 203 to the client application 217 to decode the data object 201. Either way, the transmission of the data object 201 or encoded data fragments 203 to the client application 217 can be subsequent to the transmission of the initial part of the data object 201. Because the initial part of the data object 201 can be provided to the client application 217 after retrieving the encoded metadata fragments 215 from the backend 211 and decoding the metadata unit 214, the initial part of the data object 201 can be provided to and processed by the client application 217 much more quickly than the data object 201 as a whole.

Because the amount of frontend storage space utilized is minimal, content stored on the backend 211 (e.g., encoded data fragments 203) can be cached to frontend storage in one embodiment, thereby improving overall performance. In other words, a given amount of low latency frontend storage space can be used to cache backend data such as encoded data fragments 203, thereby enabling faster access times when cached data are retrieved, for example during a read operation. The amount of frontend storage space to utilize as a cache in this content is a variable design parameter.

In one embodiment, low latency frontend storage elements 209 may be in the form of media other than flash memory, such as dynamic random access memory (DRAM). DRAM is typically faster but more expensive than flash memory. Unlike flash memory, DRAM is volatile, quickly losing its data without power. In one embodiment a copy of the contents of a frontend storage element 209 (e.g., the metadata fragment identifiers 205, which are small in size) can be provided to the backend 211 for storage. This enables the contents of the frontend storage element 209 to be rebuilt from the copy stored on the backend 211, for example in the case where the contents of the frontend storage are lost due to, e.g., a lapse in supplied power to volatile storage elements such as DRAM.

Note that even where the metadata fragment identifiers 205 are distributed across multiple frontend storage elements for redundancy (e.g., multiple frontend storage elements 209 per datacenter 109, or across multiple datacenters 109), the total amount of low latency frontend storage space utilized is still orders of magnitude less than in conventional systems in which all of the metadata 207 is stored on the frontend 209.

In addition, the encoded metadata fragments 215 can be processed by the backend 211 in the same or a similar way as any other encoded fragment 203 (e.g., the encoded data fragments 205). Thus, the encoded metadata fragments 215 can be stored on the backend 211 in a way that provides the same level of redundancy as is provided for the data object 201.

Thus, the use of the data object retrieval manager 101 as described above ensures that the storage capacity per data object 201 for metadata 207 on expensive low latency storage elements 209 with a high level of responsiveness is reduced, without compromising system level responsiveness. The overall redundancy level of the metadata 207 and scalability is also improved. The client application 217 does not experience the delays conventionally associated with downloading sufficient encoded data fragments 203 to allow for a decoding operation in order to initiate processing of the data object 201. Therefore, the end-user perceived latency of a read operation is significantly reduced.

FIG. 3 is a flowchart illustrating steps that may be performed by the data object retrieval manager 101, according to one embodiment. The data object retrieval manager 101 encodes 301 a data object 201 into a plurality of encoded data fragments 203. The data object retrieval manager 101 transmits 303 the plurality of encoded data fragments 203 to the backend 211 of the distributed encoded data storage system 100, for distribution across a plurality of backend storage elements 213, such that the distribution provides a specific level of redundancy. The data object retrieval manager 101 may receive 305 a corresponding fragment identifier 205 for each one of the plurality of encoded data fragments 203, from the backend 211 of the distributed encoded data storage system 100 (as noted above, in other embodiments fragment identifiers can be generated/selected on the frontend and provided to the backend for use). The data object retrieval manager 101 stores 307 the corresponding fragment identifier 205 and associated metadata 207 for each one of the plurality of encoded data fragments 203 in a first section of a single metadata unit 214. The data object retrieval manager 101 also stores 309 an initial part of the data object 201 in a second section of the metadata unit 214. The data object retrieval manager 101 encodes 310 the single metadata unit 214 into a plurality of encoded metadata fragments 215 (note that metadata fragments 215 may but need not be of the same common fragment format as the encoded data fragments 205). The data object retrieval manager 101 transmits 311 each one of the plurality of encoded metadata fragments 215 to the backend 211 of the distributed encoded data storage system 100, where it is stored so as to provide the same specific level of redundancy as the encoded data fragments 205, and hence the underlying data object 201. In response to transmitting each one of the plurality of encoded metadata fragments 215, the data object retrieval manager 101 receives 313 a corresponding fragment identifier 205 from the backend 211 of the distributed encoded data storage system 100 (once again, as noted above, in other embodiments fragment identifiers can be generated/selected on the frontend and provided to the backend for use). The data object retrieval manager 101 associates 315 the fragment identifiers 205 corresponding to the encoded metadata fragments 215 with the data object 201, and stores 317 these fragment identifiers 205 on a low latency frontend storage element 209, from which they can be quickly retrieved and used to obtain the data object 201. As explained above, frontend storage elements 209 (e.g., SSDs) can provide faster access to stored content than backend storage elements 213 (e.g., hard disks). The data object retrieval manager 101 receives 319 a request from a client application 217 to access the data object 201, and in response retrieves 321 the fragment identifiers 205 from the frontend storage element 209. The data object retrieval manager 101 proceeds to retrieve 323 the encoded metadata fragments 215 from the backend 211 of the distributedted encoded data storage system 100. The data object retrieval manager 101 then decodes 324 the single metadata unit 214 using the retrieved encoded metadata fragments 215, and retrieves 325 the initial part of the data object 201 from the second section of the single metadata unit 215. The data object retrieval manager 101 then transmits 327 the initial part of the data object 201 to the client application 217 to begin processing the data object 201.

FIGS. 1-2 illustrate a data object retrieval manager 101 residing on a single storage server 105. It is to be understood that this is just an example. The functionalities of the data object retrieval manager 101 can be implemented on other computing devices in other embodiments, or can be distributed between multiple computing devices. It is to be understood that although the data object retrieval manager 101 is illustrated in FIG. 1 as a standalone entity, the illustrated data object retrieval manager 101 represents a collection of functionalities, which can be instantiated as a single or multiple modules on one or more computing devices as desired.

It is to be understood the data object retrieval manager 101 can be instantiated as one or more modules (for example as object code or executable images) within the system memory (e.g., RAM, ROM, flash memory) of any computing device, such that when the processor of the computing device processes a module, the computing device executes the associated functionality. As used herein, the terms “computer system,” “computer,” “client,” “client computer,” “server,” “server computer” and “computing device” mean one or more computers configured and/or programmed to execute the described functionality. Additionally, program code to implement the functionalities of the data object retrieval manager 101 can be stored on computer-readable storage media. Any form of tangible computer readable storage medium can be used in this context, such as magnetic or optical storage media. As used herein, the term “computer readable storage medium” does not mean an electrical signal separate from an underlying physical medium.

Throughout this specification, plural instances may implement components, operations, or structures described as a single instance. Although individual operations of one or more methods are illustrated and described as separate operations, one or more of the individual operations may be performed concurrently, and nothing requires that the operations be performed in the order illustrated. Structures and functionality presented as separate components in example configurations may be implemented as a combined structure or component. Similarly, structures and functionality presented as a single component may be implemented as separate components. These and other variations, modifications, additions, and improvements fall within the scope of the subject matter herein.

The embodiments illustrated herein are described in enough detail to enable the disclosed teachings to be practiced. Other embodiments may be used and derived therefrom, such that structural and logical substitutions and changes may be made without departing from the scope of this disclosure. The Detailed Description, therefore, is not to be taken in a limiting sense, and the scope of various embodiments is defined by the below claims, along with the full range of equivalents to which such claims are entitled.

As used herein, the term “or” may be construed in either an inclusive or exclusive sense. Moreover, plural instances may be provided for resources, operations, or structures described herein as a single instance. Additionally, boundaries between various resources, operations, modules, engines, and data stores are somewhat arbitrary, and particular operations are illustrated in a context of specific illustrative configurations. Other allocations of functionality are envisioned and may fall within a scope of various embodiments of the present disclosure. In general, structures and functionality presented as separate resources in the example configurations may be implemented as a combined structure or resource. Similarly, structures and functionality presented as a single resource may be implemented as separate resources. These and other variations, modifications, additions, and improvements fall within a scope of embodiments of the present disclosure as represented by the appended claims. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.

The foregoing description, for the purpose of explanation, has been described with reference to specific example embodiments. The illustrative discussions above are not intended to be exhaustive or to limit the possible example embodiments to the precise forms disclosed. Many modifications and variations are possible in view of the above teachings. The example embodiments were chosen and described in order to best explain the principles involved and their practical applications, to thereby enable others to best utilize the various example embodiments with various modifications as are suited to the particular use contemplated.

Note that, although the terms “first,” “second,” and so forth may be used herein to describe various elements, these elements are not to be limited by these terms. These terms are only used to distinguish one element from another. For example, a first contact could be termed a second contact, and, similarly, a second contact could be termed a first contact, without departing from the scope of the present example embodiments. The first contact and the second contact are both contacts, but they are not the same contact.

The terminology used in the description of the example embodiments herein is for describing particular example embodiments only and is not intended to be limiting. As used in the description of the example embodiments and the appended claims, the singular forms “a,” “an,” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. Also note that the term “and/or” as used herein refers to and encompasses any and/or all possible combinations of one or more of the associated listed items. Furthermore, the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, blocks, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, blocks, steps, operations, elements, components, and/or groups thereof.

As used herein, the term “if” may be construed to mean “when” or “upon” or “in response to determining” or “in response to detecting,” depending on the context. Similarly, the phrase “if it is determined” or “if [a stated condition or event] is detected” may be construed to mean “upon determining” or “in response to determining” or “upon detecting [the stated condition or event]” or “in response to detecting [the stated condition or event],” depending on the context.

As will be understood by those skilled in the art, the invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. Likewise, the particular naming and division of the portions, modules, servers, managers, components, functions, procedures, actions, layers, features, attributes, methodologies, data structures and other aspects are not mandatory or significant, and the mechanisms that implement the invention or its features may have different names, divisions and/or formats. The foregoing description, for the purpose of explanation, has been described with reference to specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or limiting to the precise forms disclosed. Many modifications and variations are possible in view of the above teachings. The embodiments were chosen and described in order to best explain relevant principles and their practical applications, to thereby enable others skilled in the art to best utilize various embodiments with or without various modifications as may be suited to the particular use contemplated. 

What is claimed is:
 1. A computer-implemented method, comprising: encoding a data object into a plurality of encoded data fragments; transmitting each one of the plurality of encoded data fragments to a backend of a distributed encoded data storage system, for distribution across a plurality of backend storage elements, wherein the distribution of the plurality of encoded data fragments across the plurality of backend storage elements provides a specific level of redundancy; for each one of the plurality of encoded data fragments, storing a corresponding fragment identifier and associated metadata in a first section of a single metadata unit; storing an initial part of the data object in a second section of the single metadata unit; encoding the single metadata unit into a plurality of encoded metadata fragments; transmitting each one of the plurality of encoded metadata fragments to the backend of the distributed encoded data storage system for distribution across the plurality of backend storage elements, wherein the distribution of the plurality of encoded metadata fragments across the plurality of backend storage elements provides the specific level of redundancy; associating fragment identifiers corresponding to the encoded metadata fragments with the data object; storing the fragment identifiers corresponding to the encoded metadata fragments in a frontend storage element, wherein the frontend storage element provides faster access to stored content than backend storage elements; receiving a request from a client application to access the data object; retrieving the fragment identifiers corresponding to the encoded metadata fragments from the frontend storage element, responsive to receiving the request to access the data object from the client application; retrieving the encoded metadata fragments from the backend of the distributed encoded data storage system; decoding the single metadata unit using the retrieved encoded metadata fragments; retrieving the initial part of the data object from the second section of the single metadata unit; and transmitting the initial part of the data object to the client application to begin processing the data object.
 2. The computer-implemented method of claim 1, wherein transmitting the initial part of the data object to the client application to begin processing the data object further comprises: transmitting the initial part of the data object to the client application to begin processing the data object, prior to or concurrently with retrieving the encoded metadata fragments from the backend of the distributed encoded data storage system.
 3. The computer-implemented method of claim 1, wherein to begin processing the data object further comprises one of: to display a preview of the data object on a screen of a computing device; and to launch an end-user application for opening the data object.
 4. The computer-implemented method of claim 1, wherein the initial part of the data object further comprises: a header of the data object; and a subsection of content of the data object.
 5. The computer-implemented method of claim 1, wherein the initial part of the data object further comprises: information concerning a format of the data object; and a subsection of content of the data object.
 6. The computer-implemented method of claim 1, further comprising: retrieving, for each one of the plurality of encoded data fragments, the corresponding fragment identifier and associated metadata from the first section of the single metadata unit; retrieving encoded data fragments from the backend of the distributed encoded data storage system, utilizing corresponding fragment identifiers and associated metadata; and transmitting encoded data fragments to the client application to decode the data object, subsequently to transmitting the initial part of the data object to the client application to begin processing the data object.
 7. The computer-implemented method of claim 1, further comprising: retrieving, for each one of the plurality of encoded data fragments, the corresponding fragment identifier and associated metadata from the first section of the single metadata unit; retrieving encoded data fragments from the backend of the distributed encoded data storage system, utilizing corresponding fragment identifiers and associated metadata; decoding the data object using the retrieved encoded data fragments; and transmitting the data object to the client application, subsequently to transmitting the initial part of the data object to the client application to begin processing the data object.
 8. The computer-implemented method of claim 1, wherein: the backend storage elements further comprise electromechanical storage devices.
 9. The computer-implemented method of claim 1, wherein: the frontend storage element further comprises a solid state storage device.
 10. The computer-implemented method of claim 1, wherein providing a specific level of redundancy further comprises: providing a predetermined level of storage redundancy associated with the distributed encoded data storage system.
 11. The computer-implemented method of claim 1, wherein: metadata associated with each specific encoded data fragment of the plurality of encoded data fragments further comprises a storage location of the specific encoded data fragment and encoding information concerning the specific encoded data fragment.
 12. The computer-implemented method of claim 1, further comprising: transmitting a copy of content of the frontend storage element to the backend of the distributed encoded data storage system for storage; retrieving a stored copy from the backend, responsive to a loss of content of the frontend storage element; and storing the retrieved stored copy on the frontend storage element.
 13. The computer-implemented method of claim 1, further comprising: in response to transmitting each one of the plurality of encoded data fragments, receiving, for each one of the plurality of encoded data fragments, a corresponding fragment identifier from the backend of the distributed encoded data storage system; and in response to transmitting each one of the plurality of encoded metadata fragments, receiving, for each one of the plurality of encoded metadata fragments, a corresponding fragment identifier from the backend of the distributed encoded data storage system.
 14. A system, comprising: one or more processors; and a storage manager executable by the one or more processors to perform operations comprising: encoding a single metadata unit storing a part of a data object into a plurality of encoded metadata fragments; transmitting the plurality of encoded metadata fragments to a backend of a distributed encoded data storage system for distributed storage across a plurality of backend storage elements; responsive to receiving a request to access the data object, retrieving a plurality of fragment identifiers corresponding to the plurality of encoded metadata fragments from a frontend storage element; retrieving the encoded metadata fragments from the backend of the distributed encoded data storage system; decoding the single metadata unit using the retrieved encoded metadata fragments; retrieving the part of the data object from the single metadata unit; and transmitting the part of the data object to a requesting application to begin processing the data object.
 15. The system of claim 14, wherein the storage manager is executable by the one or more processors to perform operations further comprising: encoding the data object into a plurality of encoded data fragments; transmitting the plurality of encoded data fragments of the data object to a backend of a distributed encoded data storage system for distributed storage across a plurality of backend storage elements; storing the plurality of fragment identifiers and associated metadata corresponding to the plurality of encoded data fragments in a first section of the single metadata unit; and storing the part of the data object in a second section of the single metadata unit.
 16. The system of claim 15, wherein the storage manager is executable by the one or more processors to perform operations further comprising: retrieving the plurality of fragment identifiers and associated metadata from the first section of the single metadata unit; retrieving the plurality of encoded data fragments from the backend of the distributed encoded data storage system, utilizing the plurality of fragment identifiers and associated metadata; and transmitting the plurality of encoded data fragments to the requesting application subsequent to transmitting the part of the data object to the requesting application.
 17. The system of claim 15, wherein the storage manager is executable by the one or more processors to perform operations further comprising: retrieving the plurality of fragment identifiers and associated metadata from the first section of the single metadata unit; retrieving the plurality of encoded data fragments from the backend of the distributed encoded data storage system, utilizing the plurality of fragment identifiers and associated metadata; decoding the data object using the retrieved plurality of encoded data fragments; and transmitting the data object to the requesting application subsequent to transmitting the part of the data object to the requesting application.
 18. The system of claim 14, wherein transmitting the part of the data object to the requesting application to begin processing the data object further comprises: transmitting the part of the data object to the requesting application to begin processing the data object; prior to or concurrently with retrieving the encoded metadata fragments from the backend of the distributed encoded data storage system.
 19. A system, comprising: means for encoding a single metadata unit storing a part of a data object into a plurality of encoded metadata fragments; means for transmitting the plurality of encoded metadata fragments to a backend of a distributed encoded data storage system for distributed storage across a plurality of backend storage elements; means for retrieving a plurality of fragment identifiers corresponding to the plurality of encoded metadata fragments from a frontend storage element; responsive to receiving a request to access the data object; means for retrieving the encoded metadata fragments from the backend of the distributed encoded data storage system; means for decoding the single metadata unit using the retrieved encoded metadata fragments; means for retrieving the part of the data object from the single metadata unit; and means for transmitting the part of the data object to a requesting application to begin processing the data object.
 20. The system of claim 19, further comprising: means for encoding the data object into a plurality of encoded data fragments; means for transmitting the plurality of encoded data fragments of the data object to a backend of a distributed encoded data storage system for distributed storage across a plurality of backend storage elements; means for storing the plurality of fragment identifiers and associated metadata corresponding to the plurality of encoded data fragments in a first section of the single metadata unit; and means for storing the part of the data object in a second section of the single metadata unit. 